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Abstract 



■ Alphabet size of auxiliary random variables in our canonical description is derived. 

\ Our analysis improves upon estimates known in special cases, and generalizes to an 



arbitrary multiterminal setup. The salient steps include decomposition of constituent 
rate polytopes into orthants, translation of a hyperplane till it becomes tangent to the 
achievable region at an extreme point, and derivation of minimum auxiliary alphabet 
sizes based on Caratheodory's theorem. 

1 Introduction 

The central question in Shannon theory of source coding is the characterization of achievable 
regions in information-theoretic terms. Historically, simple information-theoretic (so-called 
'single-letter') descriptions were shown to completely characterize the achievable regions of 
certain problems, such as Shannon's lossless and lossy coding problems [1, 2], the Slepian- 
Wolf problem [3], the Wyner-Ahlswede-Korner problem [4, 5], the Wyner-Ziv problem [6], 
and the Berger-Yeung problem [7]. Specifically, coincident inner and outer bounds have been 
found for the aforementioned problems. However, in certain other source coding problems, 
including the Berger-Tung and the partial side information problems [8, 9], coincident inner 
and outer bounds have not been found. In this paper, we shall consider a general class of 
inner bounds, which we call canonical, and which may or may not be tight [10]. For example, 
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our bound coincides with known descriptions in aforementioned solved problems, as well 
as with Berger-Tung bound known for the Berger-Tung and the partial side information 
problems. Further, unlike earlier attempts at unification, such as by Csiszar and Korner 
[11], and Han and Kobayashi [12], our canonical bound brings both lossless and lossy coding 
under the same framework. Moreover, our bound is tight for (hence solves) a large class 
of multiterminal problems [13], generalizing the longstanding single-helper problem [11]. 
However, at present we shall not focus on conditions for tightness. Instead we shall analyze 
an aspect that has historically received very little attention. Note that our inner bounds 
involve certain auxiliary variables {Z k } with alphabets {Z k } (the notation is made precise 
subsequently). Alphabet sizes {|^fe|} play an important role in practical computation, and 
hence understanding of the inner bounds (see, e.g., [14, 15]). The available results generally 
estimate \Z k \ < \X k \ + const ant, where X k is the given alphabet of the source X k associated 
with the auxiliary variable Z k , and the constant is one or greater. In this paper, we shall 
derive a tight bound \Z k \ < \X k \ of such alphabets, thereby, facilitating computation. 

As alluded earlier, in different contexts \Z k \ has been estimated within a constant factor 
of \X k \. For example, we know \Z k \ < \X k \ + 2 for the Wyner-Ahlswede-Korner problem 
[4, 5], \Z k \ < \X k \ + 1 the Wyner-Ziv problem [6], and \Z k \ < \X k \ + 2 for the Berger-Yeung 
problem [7]. In those problems, there is only one auxiliary variable, and a rate- distort ion 
orthant is varied to create the desired inner bound (which equals the achievable region). In 
contrast, the Berger-Tung region involves two auxiliary variables, and is created by varying 
a convex core region, which is more complicated than an orthant [8]. So far, there exists no 
rigorous analysis of the alphabet size in this case, but estimates vary between \Z k \ < \X k \ + 1 
and \Z k \ < \X k \ + 2. In an earlier paper [13], we gave an estimate of \Z k \ < \X k \ + M for 
the general M-terminal single-helper problem, where the convex core region is a complicated 
polytope. 

In this backdrop, Gu and Effros estimated \Z k \ < \X k \ for the Wyner-Ahlswede-Korner 
problem using a linear programing argument [14]. Later in [15], the same result was extended 
to the Wyner-Ziv problem, and to the partial side information problem [9]. The above result 
was crucially dependent on the fact that the convex core region that sweep out the overall 
inner bound is an orthant. In contrast, we shall prove the alphabet size \Z k \ < \X k \ for any 
arbitrary problem, where the core region is always a polytope. Specifically, we decompose 
the polytope into constituent orthants, and make an orthant-based argument. The above 
decomposition, apart from being central to the problem at hand, enhances the geometric 
understanding of source coding. The main difficulty here lies in identifying the extreme 
points exhaustively, thereby identifying the constituent orthants. We show that there are 
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M\ such orthants for an M-source problem. In order to prove this result, we develop an 
intricate chain of information theoretic results. Further, the orthant-based reasoning borrows 
an essential notion from a linear-programing-based argument. In particular, we consider only 
extreme points, which are reached by translating any hyperplane, with its direction fixed, 
away from the origin towards the achievable region. Our final argument about the alphabet 
size follows the line of Wyner and Ziv based on a version of Caratheodory's theorem [6]. 



2 Canonical Inner Bound 

Consider joint source distribution p(x^i t ___ t M}, s, v ) governing source variables -X"{i,...,m}, de- 
coder side information S, and target variable V for lossy reconstruction/estimation. Also 
consider L bounded distortion measures d\ : V x V t — > [0, d imax ] (1 < I < L), each with a 
possibly distinct reconstruction alphabet Vj. In this setting, the canonical inner bound A\ 
is defined as follows. 

Definition 2.1 Define A\ as the set of (M + L) -vectors (-R{i,...,m}, satisfying the 

following conditions: 

1. auxiliary random variables Zu^ ^m} (taking values in respective finite alphabets Zsi,...m\) 
exist such that Z m = X m , 1 < m < J, and (X{i,...,m}, S, V, -^{j+i,...,m}) follows the joint 
distribution 

M 

p(x{i,...,m},s,v) Y[ 1k{z k \x k ), (2.1) 

k=.J+l 

for some test channels {qk( z k\ x k)}if=j+i> 

2. (rate conditions) 

IiXj-Z^Z^S) <J2 R i, (2-2) 

iei 

where I c = {1, 2, M} \ I, and condition (2.2) holds for all I C {1, M} \ 0; 

3. (distortion conditions) mappings ipi : X\ x ... x Xj x Zj + i x ... x Zm x S — > Vi, 1 < I < L, 
exist such that 

Ed l (V,MX { i,...,j } ,Z {J+1 _ M} ,S)) < A- (2.3) 
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Lemma 2.2 Every extreme point of A\ corresponds to some choice of auxiliary variables 
Z{j+i,...,m} with alphabet sizes \Z k \ < \Xk\, J + 1 < k < M. 

The main goal of this paper is to prove Lemma 2.2. The proof is difficult because A* 
has a complicated geometry. First of all, consider specific auxiliary variables ^{j+i,...,m}- 
Then choosing coordinate planes yi — R% — 0, 1 < i < M, and yu+i — Di — 0, 1 < I < L, 
note that distortion equations (2.3) are all parallel to coordinate planes, and hence form an 
orthant, whose analysis is tractable. On the other hand, the rate equations (2.2) are not all 
parallel to coordinate planes, leading to a complicated region. In this backdrop, in Sec. 3 we 
consider the distortion- extracted rate region given by (2.2), and find a decomposition into 
finite number of orthants. Based on such decomposition, in Sec. 4 we write A\ as a finite 
union of component regions that are formed by orthants. Finally, using such component 
regions, the extreme points in Lemma 2.2 are characterized in Sec. 5 with the help of certain 
linear combination properties. 



3 Geometry of Distortion- Extracted Rate Region 

We first consider the rate region formed by rate conditions (2.2). More generally, consider 
random variables (-X"{i,...,Af}, S, i following the joint distribution 

M 

p(x{ lt ... tM },s)Y[q k (z k \x k ). (3.1) 
fc=i 

In this section, we fix p(£{i,...,m}, s ) as well as all qk{ z k\ x k)i 1 < ^ < M. Further, define B* 
as the set of rate M-vectors R{i t ... t M} satisfying 

IiX^Z^Z^S) <J2 R i, (3-2) 

iei 

where condition (3.2) holds for all / C {1,...,M} \ 0. We call B* the distortion- extracted 
rate region because it is delinked from distortion measures. Of course, we also do not impose 
the original restrictions Z m = X m , 1 < m < J. Next we find the extreme points of B*. In 
our analysis, we shall assume that there is no degeneracy, i.e., any extraneous Markov chain 
property, not dictated by the form (3.1) of joint distribution p, does not hold. Note that the 
nondegeneracy requirement is mild, and met if all random variables under consideration are 
statistically dependent. 
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3.1 Number of Extreme Points: Upper Bound 



Lemma 3.1 Suppose there exists rate M-vector R{i : ...,m} such that 



I (Xi; Zr\Zic, S) 



(3.3) 



S) 



(3.4) 



simultaneously hold for some distinct sets 1,1' C {1, M} \0. Then either I C I' or I' C I . 

The proof is involved, and makes use of a series of new information-theoretic relations 
involving (X[i,...,m}, S, ^{i,...^})- It is given in Appendix A. 

Lemma 3.2 B* has at most Ml extreme points. 

Proof. At each extreme point of £>*, M of the 2 M — 1 constraints given by (3.2) are active. 
Therefore, in view of Lemma 3.1, the number of extreme points of B* is upper bounded by 
the number of possible ways one can have 



where 1^ C {1, ...,M} with cardinality |/( m )| = m, 1 < m < M. To begin with, we have 
the only choice = {1, M}. However, given any /( m+1 ) (1 < m < M), one can choose 
j( m ) by discarding one of the m + 1 elements of /( m + 1 ). Hence one can choose the entire 
sequence of sets {J ( - m ^}^ =1 in M x (M — 1) x ... x 2 = M! possible ways. Hence the result. 
□ 

Remark 3.3 The above argument does not clarify whether all Ml points under consideration 
are distinct. Hence we can claim only an upper bound. 

3.2 Number of Extreme Points: Lower Bound 

Lemma 3.4 The rate M-vector -R{i,...,m} such that 



j(l) c j(2) c c j(m) c j(m+l) c c j(M-l) c j(Af) 



i},5), l<i<M, 



(3.5) 



zs an extreme point of B* . 
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Remark 3.5 By Lemma A. 7 and (3.5), we have 

i{x r ,z I \z I o,s)<Y,i(x i -z i \z {lr ^,s)=Y,iU (3.6) 

iei iei 
for all I C {1, M} \ 0. Thus, by (3.2), R { i,..., M } e B* ■ 

Proof. It is enough to show that the given R{i : ... : m} makes M constraints, given in (3.2), 
active. From (3.5), we can write 

M M 

J2nX l ;Z l \Z I{1 ^ 1} ,S) = Y,^ (3-7) 

i=m i=m 

for each 1 < m < M. Further, by Corollary A. 6, (3.7) is same as 

M 

I (X{ m ,...,M}', Z{m,...,M}|-£{l,...,m-l}) S) = ^ B4, 1 < 171 < M, (3.8) 

i=m 

which makes M constraints, given in (3.2), active. This completes the proof. □ 

Now the indices {1, ...,M} in (3.5) can be permuted to obtain Ml extreme points. Im- 
portantly, these extreme points are all distinct due to the nondegeneracy assumption. 

Corollary 3.6 B* has at least Ml extreme points. 

Remark 3.7 By Lemma 3.2 and Corollary 3.6, B* has exactly Ml extreme points, each of 
which takes the form (3.5) except that the indices {1,...,M} undergo suitable permutation. 
(As it is, (3.5) corresponds to identity permutation.) 

4 Decomposition of A\ 

Now we move on to the rate-distortion region A\. Specifically, consider subset A\({qk}) 
of A\ defined by (2.1)-(2.3) for given conditional distributions qk(zk\xk), J + I < k < M. 
Of course, A\ = \JA\({qk}) ) where the union is taken over all {qk}- Note that, like A\, 
A\({qk}) is a subset of the (M + L)-dimensional real space. However, although A\ is not 
necessarily convex, each A\{{qk}) is convex. Further, every extreme point of A\ is an extreme 
point of some A\({qk})- Finally, notice that the projection of A\({qk\) onto the space of M 
rate coordinates is the same as B* with the choice Z m = X m , 1 < m < J (which does not 
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violate our degeneracy assumption), whereas the projection onto the space of L distortion 
coordinates is simply a suitable orthant. Therefore, by Remark 3.7, A\({qk}) possesses Ml 
extreme points, one of which, denoted (R^ M ^({qk}), L ^({qk})), is specified by (from 
(3.5) and (2.3)) 

R°i(Uk}) = /pQ;^|Z {w _ 1} ,S), l<i<M (4.1) 
= minE^(^^(^ { i,...,M},^)), 1<1<L, (4.2) 

where Z, m = X m , 1 < m < J. In general, any extreme point (R^ M }({<?fc}), DJ^ L }({<?fc})) 
is generated by a suitable permutation (bijection) P n : {1,...,M} — > {1,...,M}, where 
7r takes Ml values, say, {0, ...,Ml — 1} (we set P° to be the identity permutation). In 
other words, in (4.1) and (4.2), each occurrence of index i is replaced by P n (i). As regards 
dependence on n, vectors R^ M y({qk}) are all distinct (as mentioned earlier), whereas 
vectors DJ X L y({qk}) are all identical. 

At this point, denote the orthant specified by (i?^ M j({?fc}), DJ X L y({qk})) as 

A-Atik}) = {(R{i,..,Mh D {i,-,L}) ■ RUUk}) <RiA<i<M; DUUk}) < A, 1 < I < L} 

(4.3) 

for < 7r < Ml — 1, and all possible {qk}- Clearly, 

(M!-l \ 
U AiAW})) , 

where conv(-) indicates 'convex hull of. Consequently, we have 

conv(^) = conv I (J A\{{q k }) j = conv ( (J |J ^({ft}) I ■ (4.4) 
\{9*} J \{q k } ^=o / 

Now, interchanging the union operations in the last term in (4.4), and defining 

^ = U A UM)i ( 4 - 5 ) 

{<ik} 

we obtain 

(M!-l \ 

U A iA ■ ( 4 - 6 ) 

In view of (4.6), every extreme point of A\ is an extreme point of some A\.^. Consequently, 
in order to establish Lemma 2.2, it is enough to show the following. 
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Lemma 4.1 Every extreme point of A{. n (0 < ir < Ml — 1) corresponds to some choice of 
auxiliary variables Z{j+i,...,Af} with alphabet sizes \Z k \ < \Xk\, J + 1 < k < M. 

The rest of the note is devoted to the proof of Lemma 4.1. In particular, we shall prove 
the result only for n = 0. Our analysis extends to other values of n in a straightforward 
manner. At present, consider the real (M + L)-space, and let yi — 0, 1 < % < M + L, be the 
coordinate planes. In this space, an (M + L — l)-hyperplane 

M+L 



i=i 

is specified by the direction cosine vector (ai, clm+l) subject to YliL\ L a f = 1> an d the 
intercept c. At this point, identifying = Ri, 1 < % < M, and ?/m+« — Di, 1 < Z < L, note 
that v4.^. lies in the nonnegative orthant. Further, every extreme point of A\. Q has a tangent 
hyperplane of the form (4.7), whose direction cosines and intercept are nonnegative (g^ > 0, 
1 < % < M + L; c > 0) . Conversely, for any (ai, om+l) with a, > 0, 1 < i < M + L, there 
exists c > such that the hyperplane (4.7) is tangent to A\. Q at some extreme point. Hence 
we obtain the following result. 

Corollary 4.2 The set of extreme points of A\. is given by 

(m l \ M+L 

a,iRi + a M+ iDi : a 2 = 1; a t > 0, 1 < « < M + L 
ti tr / ti 

(4.8) 

By (4.3) and (4.5), every minimizer in (4.8) is of the form (R ^ M y({qk}), L }({<?fc})) 
for some {g^}. Further, using Z m = X m , 1 < m < J, in (4.1), notice that R ^ jy({qk}) 
does not depend on ({<?&}). Hence we set a x — ... — aj — without loss of generality (and 
scale the remaining direction cosines appropriately) to obtain the following. 

Corollary 4.3 The set of extreme points of A[. is given by the set of rate- distortion vectors 
(- R {i,...,M}({?fe})'- D {i,...,L}({?fe})) such that Uk} minimizes 

M L 

+ E a M+iD°({q k }), (4.9) 
i=j+i i=i 

and direction cosine vector a{j+i,...,M+L} varies through admissible values. 
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Note that Lemma 4.1 follows for n = (corresponding to identity permutation P°), if 
we lose no generality by restricting to minimizers {q k } of (4.9) that satisfy \Z k \ < \X k \, 
J + 1 < k < M. We shall show that the last condition indeed holds as a consequence of 
certain linear combination properties. 

5 Linear Combination Properties 
5.1 Change of Variables 

For J + 1 < k < M, denote marginal distributions of X k and Z k by p k (x k ) and p' k (z k ), 
respectively, and conditional distribution of X k given Z k by q' k (x k \z k ). Note that p k (x k ) is 
specified by marginalizing the source distribution p(x^i t ,„ ! M}, s,v). Further, by Bayes' rule, 
we have Pk(x k )q k (z k \x k ) = p' k (z k )q' k (x k \z k ). Of course, one completely specifies both p' k and 
q' k by specifying q k . At the same time, rather than varying q k , we can equivalently vary the 
pair (p' k , q' k ) subject to the admissibility condition 

Pk(x k ) = Pk( z k)q'k(xk\z k ). (5.1) 

Zk&Zk 

Apart from the above specific notation, we shall denote by V generic distributions. For 
example, r(y, u\w) indicates the joint distribution of (Y, U) conditioned on W . 

At this point, consider identity permutation P° of {1,...,M}, and, correspondingly, the 
set A\. ({p' k) q' k }) . Here, we recall that variation of {q k }, and variation of {p' k ,q' k } subject 
to (5.1) are equivalent, and, in a slight abuse of notation, denote by A\. {{p' k) q' k }) the set 
function of {p' k ,q' k } equalling ^ ;7r ({g fc }). Subsequently, we shall make analogous change of 
variables without explicit mention. Using Z m = X rn , 1 < m < J, in (4.1) and (4.2), we have 

Ri({p'k,Qk}) = H(Xi\X {1 ,... ti - 1} ,S), l<i<J (5.2) 
Ri({p'k,Qk}) = I(X, l ;Z l \X {1 _ Jh Z {J+1 _ t _ 1} ,S), J+l<i<M (5.3) 
= mmEd l (V,MX { i,...,j } ,Z {J+1 _ Mh S)), 1 < I < L. (5.4) 

As mentioned earlier, and by (5.2), jy({p' k , q' k }) does not depend on ({p' k ,q' k })- How- 

ever, the remaining rate and distortion components, given by (5.3) and (5.4), do exhibit 
dependence on ({p' k ,q' k })- 

Next we isolate the dependence of individual rate as well as distortion component on in- 
dividual pair (p' fc , q' k ), while keeping the rest of the pairs fixed. We highlight the dependence 
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on (p' k ,q' k ) by dropping the rest of the pairs {(p' K , q' K )} K -^k from the argument. Specifically, 
we show that each R®{p' k) q' k ) (k < i < M) and each Df(p' k , q' k ) (1 < I < L) is a linear combi- 
nation of functionals of q' k (-\z k ys weighted by p' k (z k ys. Here q' k (-\z k ) denotes the probability 
vector {q' k (x k \z k )} XkeXk for a given z k E Z k . 

5.2 Rate Components 

Consider J+l<k<i<M. From (5.3), we have 

Ri(p'k,q'k) = i( x i; z i\ x {i,...,j}, z {j+i,...,i-i},s) 

= H(Xi\X {l ^ J} , Z {J+1 ^i_ 1} , S) - H(X i \X {1> _ tJ} , Z {J+ i i} , S). (5.5) 

Further, denote by Ax k the (\X k \ — l)-dimensional probability simplex, i.e., the set of prob- 
ability vectors defined on X k . 

Lemma 5.1 IfJ+l<k<i<M, then 

H(X t \X {1 _ J} ,Z { j +1 _^ lh S)= J2 J4(*0*2V*(«) 

for some functional <&jjV defined on A Xk . 

Proof. First note that, if % — k, then the target entropy does not depend on (p' k ,q' k ), 
and reduces to a trivial constant. A more interesting situation arises when i > k. In 
this case, verify that k G { J+ 1, i — 1}. Now write U = ..,,/}, Z{j + i t .„^i}\{ k }, S), and 
verify that 

Z k ^X k ^(U,X t ) (5.6) 
forms a Markov chain. Hence we obtain 

H(Xi\X{ lt ... :J y, Zs^ J+ x,...,i-v i - l S) = H(Xi\Z k} U) 

E r(xi,z k ,u) 
rfa, z k) u) log — — 
, v r(z k ,u) 

(Xi,Z k ,u) 

V- // n // | n / | m T. Xk V' k ^k)q' k {x k \z k )r(x t ,u\x k ) 

= " „£, p» w * " W log E« j4M«IM»M«|*») (5 - 7) 

. ^ _^ ^ z q' k (x k \z k )r(xi, u\x k ) 

= -^P k (z k )^^q k (x k \z k )r(x u n\x k )log » ^ Xk \ Zk)r{u \ Xk) ^ 
Zk ( Xi ,u) x k ^ x k «i «y v i «y 

= E^jO^&Ok))- (5-9) 
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Here (5.7) follows by noting Markov chain (5.6), and writing 

r(z k ,x h u) = ^r(z k ,x k ,Xi,u) = ^2p' k (z k )q' k (x k \z k )r(xi,u\x k ) 

r(z k ,u) = ^2p k (z k )q' k (x k \z k )r(u\x k ). 



Further, (5.8) follows by rearranging, and by canceling out p' k (z k ) from the numerator and 
denominator of the argument of 'log'. Finally, (5.9) follows by defining the functional 

•!?(*) = - E E «(*»>**.■ » w bg ^Xt(utf • 

(Xi,U) x k 'a* 

where t = {t(x k ) : x k G X k } is any probability vector on X k . □ 
Adopting a similar approach, we also obtain the following. 



Lemma 5.2 IfJ + l<k<i<M, then 



H(X l \X {l _ J} ,Z {J+l _ l} ,S)= E Pi(^W k hq' k (-\z k )) 

for some functional defined on A Xk - 

Noting (5.5), combining Lemmas 5.1 and 5.2, and writing = — we obtain 
the following corollary. 

Corollary 5.3 IfJ + l<k<i<M, then 

Zk^Zk 

for some functional Q ki defined on A Xk . 



5.3 Distortion Components 

Lemma 5.4 For J + 1 < k < M , and 1 < I < L, we have 

A°(PU*)= E P k (z k )*kl(q' k (-\z k )) 
for some functional ^ k i defined on Ax k . 
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Proof. Write U = pf{i,...,j}, £{j+i,...,M}\{k}, 5), and verify that 

Z k ^X k ^(U,V) 
forms a Markov chain. Hence from (5.4), we obtain 
D°(p' k ,q k ) = mmEd^M^Z,)) 

= min S~] r(u,v,z k )di(v,tpi(u,z k )) 



(t),tt,Zfc) 



min V" V"p / A .(z fc )^(x A; |^)r(M,w|xfc)ci i (w,^/(M,2; fc )) 

(v,u,z k ) x k 



mm 



(w,a;fc) 



2fc 



(5.10) 



(5.11) 

(5.12) 
(5.13) 



Here (5.11) follows by noting Markov chain (5.10), and writing 

r(u,v,z k ) = ^2r(u,v,x k ,z k ) = ^2p' k (z k )q' k (x k \z k )r(u,v\x k ). 

Xk %k 

Further, (5.12) follows by rearranging. Finally, (5.13) follows by defining the functional 



= J" min 



^2 t(x k )r(u,v\x k )di(v,vi) 

(y,x k ) 



where t = {t(x k ) : x k G X k } is any probability vector on X k . 



□ 



5.4 Minimization of Linear Combination 

At this time, consider the setting of Corollary 4.3, i.e., a\ = ... = aj = 0. 

Lemma 5.5 Pick any J + 1 < k < M, and fix admissible 0{j+i,...,m+l} o,nd {(p' K , q' K )} K ^k in 
an arbitrary manner. Then there exists a minimizer (p' k , q' k ) of the problem 

M L 

m mil1 rr Yl a i R i(Pk^k) +J2 a M+lD?(p' k ,q k ) 
(Pk'Vk) subject to (5.1J. =J+1 /=1 

such that p' k (z k ) is defined on alphabet Z k with size \Z k \ < \X k \ (and hence q' k (x k \z k ) is 
specified by at most \X k \ probability vectors defined on X k ). 
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Proof. Given a {J+ly ... M +L} and {(p' K ,q' K )} K ^k, consider 

M L 

uj = E a i R °i(Pk, Qk) + E aM+iD^{p' k , q k ), 
i=j+i 1=1 

and denote by fl the set of admissible values of uj. Further, denote uj* = min wg n uj. Now, by 
Corollary 5.3 and Lemma 5.4, we have 

<"= E Pk(zkMq' k (-\z k )), (5.14) 

where Q(t) = J2i=j+i a i^ki(t)+^2f =1 a M+i^ki(t) is defined on A Xk . Note that is continuous 
and bounded, and the (\X k \ — l)-dimensional probability simplex Ax k is compact. Now 
consider the mapping t — > (t, O(t)), and denote by S the image of A% k under this mapping. 
Of course, S is connected and compact, and S has dimensionality \X k \. Therefore, by 
Fenchel-Eggleston strengthening of Caratheodory's theorem, any point in conv(S) is a linear 
combination of at most \X k \ points in S. Further, in view of (5.1) and (5.14), any pair (p k , uj) 
belongs to conv(5). In particular, set fl of admissible u>, where source distribution p k is fixed 
by problem statement, is given by 

Q = {uj : ipkiOj) G conv(5)}. 

In other words, every admissible uj G fl, including uj*, can be expressed as in (5.14) with 
1-2*1 < \Xk\- This completes the proof. □ 

Corollary 5.6 For any admissible a{j+i,...,M+L}> there exists a minimizer {p' k , q' k } of the 
problem 

M L 

™; m ft T G^(K, Qk}) + E a M + lD?({p'k, Q'k)) 
\Pk - % I sub J ect to(5.1). =J+1 l=1 

such that each p' k (z k ) (J + 1 < k < M) is defined on alphabet Z k with size \Z k \ < \X k \ (and 
hence each q' k (xk\z k ) is specified by at most \X k \ probability vectors defined on X k ). 

Proof. We shall prove the result by contradiction. Suppose there exists admissible 
a {j+i,...,M+L} such that a minimizer {p' k , q' k } with \Z k \ < \X k \, J + l < k < M, does not exist. 
Pick such 0{j+i,...,m+l}, and compute the minimum value of the objective function. By 
supposition, any corresponding minimizer {p' k ,q' k } has \Z-i\ > \Xi\ for some J + 1 < i < M. 
We now undertake a procedure such that the minimum value does not increase at any stage. 
Specifically, choose k — J + 1, and keep {(p' K , q' K )} K ^k fixed. By Lemma 5.5, the objective 
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function is no greater than for some new choice (p' k ,q' k ) with \Z k \ < \X k \. Update (p' k , q' k ) 
to this new choice. Next choose k = J + 2, keep {{p' K) q' K )} K ^ k fixed, and update (p' k , q' k ) (in 
view of Lemma 5.5) such that the objective function is no greater than 0, yet \Z k \ < \X k \. 
Continue this procedure till k = M. Finally, we have a new {(p' k: q' k )} with \Z k \ < \X k \, 
J + 1 < k < M, such that the corresponding objective function is no greater than 0. This 
is a contradiction. □ 

Proofs of Lemmas 4-1 and 2.2: Note that {q k } is completely determined by {p' k , q' k } 
by Bayes' rule 

qk{z k \x k ) = p' k (z k )q k (x k \z k )/p k (x k ), 

because p k (x k ) is specified by the problem statement. Therefore, by Corollary 5.6, we lose no 
generality by restricting to minimizers {q k } of (4.9) that satisfy \Z k \ < \X k \, J + 1 < k < M. 
Hence Lemma 4.1 follows for tt = (corresponding to identity permutation P°). Further, a 
similar analysis straightforwardly establishes Lemma 4.1 for each 1 < tc < M\ — 1. Finally, 
in view of (4.6), Lemma 2.2 follows. □ 



A Proof of Lemma 3.1 

Lemma A.l Suppose sets 1,1' C {1, M} \ are disjoint. Then 

I (X/; Z/|Z(/u//)c, S) — I (X/; Zj\Zjc, S) + I [Zf, Z 'p\Z '(/u/') c > ■ (^-1) 

Proof. First expand 

I [Zi\ X/, Z//|Z(/ u7 /)c ) S) = I [Zj\ Zj^Z^yjj/y, S) + I {Zj\ Xj\Zjc, S) , (A. 2) 

applying the chain rule of mutual entropy. Expand the same quantity again, now applying 
the chain rule in a different order: 

I [Zj\ X/, Z//|Z(7 U //)c, S) = I [Zj; Xj\Z(iuiiy, S) + I [Zj\ Zp\Xi, Z( IUI ry, S) . (A. 3) 

Note that Zi — > (X/, Z( IUI >y, S) — > Zp forms a Markov chain (since / and I' are distinct), 
i.e., 

I (Zf, Z v \Xi, Z( Iur y, S) — 

in (A. 3). Hence, equating the right-hand sides of (A. 2) and (A. 3), and rearranging, we obtain 
(A.l). □ 
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Lemma A. 2 Suppose sets I, I' C {1, M} \ are disjoint. Then 

I (Xi\ji>] Ziui'\Z(iui') c , S) = I {Xp Zp\Z( Iu py, S) + I (Xp; Zp\Zpc, S) . (A. 4) 

Proof: For any quadruple (Ui, U 2 , V\, V 2 ) of random variables, we can write 

I(U U U 2 ;V U V 2 ) = I(U 1 ,U 2 ;V 1 ) + I(U 1 ,U 2 ;V 2 \V 1 ) 

= IiU^+IiU^VM+IiU^V^+IiU^V^,^) (A.5) 

by repeatedly applying the chain rule of mutual information. Identifying (Ui, U 2 , V\, V 2 ) with 
(X/, Xp, Zp Zp), and applying formula (A.5) (while maintaining conditioning on (Z( I{J py, S) 
throughout), we obtain 

I [Xivjp] Z Iu p\Z( Iu py, S) = I [Xi] Zi\Z( Iu py, S) + I [Xp] Zi\Xi, Z( Iu py, S) 

+ I (X r] Zp\Zpc, S)+I (Xp Zp\Xp, Zpc, S) . (A.6) 

In (A.6), / [Xp\ Zp\Xp Z( lLi py, S) = and / (X/; Zp\Xp, Zpc, S) = 0, respectively, because 
Zi — > (Xp Z( lLl py, S) — > Xp and Zp — > (Xp, Zpc, S) — > Xj form Markov chains (since / 
and I' are distinct). Hence the result. □ 

More generally, any I C {1, M} \ can play the role of {1, M} in the statement of 
Lemma A. 2 so that I' c can be replaced by / \ V and (/ U I') c by / \ (/ U /'). In that case, 
Lemma A. 2 immediately takes the following form: 

Corollary A. 3 Suppose sets 1,1' C I are disjoint, where I C {1, M} \ 0. Then 

I (Xnjp] Z I{J p\Zj^ IUI ,y S^j = I {^Xp Zp\Zf^ Il} py + 1 [x r ; Zp\Zj\ r , S^j . (A. 7) 

Now consider arbitrary / C {1,...,M} \ with cardinality |/| = m, and denote its 
elements by i(l : m). Further, setting / = {1, M}, and letting ({i(l)} , I \ {i(l)}) play the 
role of (/,/') in (A. 7), we have 

I (Xf, Zp\Zp, S) = I (Xj(!); Z i{ i)\Zp, S) + I ^/xi^i)} |2'(/\{i(i)}) c , S) . (A. 8) 

Next set I = I \ {i(l)} = {i(2 : m)}, let ({i(2)},I\ {i(l : 2)}) play the role of (/,/') in 
(A. 7), and continue so as to obtain 

m 

l{XpZp\Zp,S) = ^l(X i{j y,Z i{j) \Z (I \ {iil:j ^ 1)} y,S). (A. 9) 

Noting (/ \ {i(l : j - 1)}) C = {1, M} \ {i(j : m)} in (A.9), we have the following result. 
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Corollary A. 4 Suppose set I C {1, ...,M} \ has cardinality \I\ — m (1 < m < M), and 

denote elements of I by i(j), 1 < j < m. Then 

m 

I{Xj\Zj\Zio,S) = ^ I {Xj(jy, ^i(j)l-^{l,...,M}\{iQ:m)}, S) ■ (A.10) 

Further, suppose J = {1, ...,m} for some 2 < m < M. For the choice / = {1, ...,m — 1} 
and I' = {m}, (A. 7) becomes 

S), (A.ll) 

which provides a useful chain rule. Applying this repeatedly, we obtain the following. 
Corollary A. 5 For any 1 < m < M , 

m 

I {X{i t ..., m y, Z {lt ..., m }\S) = ^2 1 [Xi] Zi\Z{i^i-i}, S) . (A.12) 

In fact, Corollary A. 5 can be further generalized as follows. For any 1 < m < M, set 
I — {1, M}, I — {1, m} and /' = {m + 1, M} in Lemma A. 3 to obtain 

I (^{1,...,M}', Z {1,...,M}\S) = I (X{l t ..., m }', Z{l,...,m}|S') +/ (^{ m +l,...,M}; ^{m+l,...,M} | ^{l,...,m} , 5*) • 

(A.13) 

Expanding I (X {lt .^ M y, Z {h .^ M} \S) and / (X { i v .. jm }; Z {lt ,^ m} \S) using Corollary A.5, from 
(A.13) we obtain the following. 

Corollary A. 6 For any 1 < m < M , 

I (X{ m +1,...,M}', ^{m+l,...,M}|^{l,...,m}, S) 

Lemma A. 7 For any set I C {1, M} \ 0, 

/ (X /; ZjIZjc, 5) < 1 ( X - ^l^{i,...,i-i}> 5) • (A.15) 

Proof. Denote m = \I\, and let the elements i(l), i(2), i(m) of / be arranged in 
ascending order. Consequently, note 

{1, - 1} C {1, M} \ : m)}, l<j<m. (A.16) 



M 

= l(X i -Z i \Z {1 ,..., i . 1} ,S). (A.14) 

i=m+l 
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Therefore, we have 



{1, M} \ : to)} = {1, - 1} U (A.17) 

where 

= ({1, .., M} \ : to)}) \ {1, - 1}, 1 < j < m. 

In view of (A.17), we can write 

1 { X i{j)> Z i(j)\ Z {l,...,M}\{i{j-m)}, S) = I (x^y Z i l y j)\Z {X) ...,i{j)-\], Z }{j) , S^j 

= H (z i{j) \Z {K .. m _ 1} , Zf {j) , S^j - H (z i{ jy\X i{j) , Z {1 ^ i{j) _ 1} , Z }{j) , S^j 

= H (z i{j) \Z {K ^ i{ j)_ 1} , Z I{j) , S^j - H (Z i{j) \X i{j) , Z { i v .. jfi(j) _i } , S) (A.18) 
< H (Z iU y\Z {li ^ iU) _ 1} ,S) - H (Z i{j y\X iU y Z {li ^ iU) _ 1} , S) (A.19) 
= / {X i(j y Z i(j) \Z {h ... Mj) - 1} , S) . (A.20) 

Here (A.18) follows by noting 

H (z i{j) \X i{j y Z }(j) , S^j = H (Z i{j) \X i(j) ) = H (Z i(j) \X i(j) , S) 

due to the fact that Z^ — > X^ — > (Z^ ^my forms a Markov chain. Further, 
(A.19) follows because conditioning reduces entropy. Now summing (A.20) over 1 < j < to, 
we obtain 

m m 

^2 1 (X i{j y, Z i(j) \Z {lt ...,M}\{i(j:m)}, S) < ^ I (X i(j y Z i(j) \Z {lt ... Aj y 1} , S) . (A.21) 

3=1 3=1 

By Corollary A. 4, the left-hand side of (A.21) equals I (Xj; Zi\Zic, S). Also, note that the 
right-hand side of (A.21) is same as the right-hand side of (A. 15). Hence (A.21) is the desired 
result. □ 

Proof of Lemma 3.1: It is enough to show the following: if we have / \ I' ^ 
as well as I' \ I ^ 0, then there exists no rate M-vector R{i : ... t M} £ B* such that (3.3) 
and (3.4) hold simultaneously. To prove this, first we assume that (3.3) and (3.4) hold for 
some -Rji,...^} £ &* an d some (/, /') with the aforementioned property, and then detect a 
contradiction. 

First consider the case where / and I' are disjoint. Using (A.l) in (A. 4), we obtain 

I {Xiur', Znji^Z^yjjiy, S) = I (Xi; Zi\Zic, S) + I (Xp; Zp\Zpc, S) 

+l(Z I ;Z r \Z {Iuir ,S). (A.22) 
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Now, adding (3.3) and (3.4) and comparing with (A. 22), we have 

^ Ri = I (X mi/] Zi UI r\Z( I{J rf, S) — I [Zi\ Zi'\Z(i\jr)c, S) 
ieiur 

< I (Xiui r , Z IVJ r\Z( Iur y, S) , (A. 23) 

because (Z I: {Z^py, S), Zp) does not form a Markov chain. Note that (A. 23) contradicts 
condition (3.2) with I U I' playing the role of I. 

Next consider the case where I f] I' — I ^ {}. Writing / = {I \ I) U /, from (3.3), we 
have 



+ — I {Xi]Zi\Zic,S) — I ^(/\/)u/; ^(/\/)u/l^((J\/)ui>> 

= I (X A/ ; Z Ixi \Z {lW , S) + I (X f Zj\Z }c , S) + / (Z A/ ; Zj\Zp, s) (A.24) 



i€/\/ ie/ 



in the same manner as (A. 22) with (J \ /, I) playing the role of {I, I'). Further, from (3.2), 
note that 

Y,Ri>HXf,Zi\ZidS). (A.25) 

iei 

Using (A.25) in (A.24), we have 

Ri = E R i ^ 1 ( x A/> ^\/l%\/T' s ) + 1 ( z a* ^l^ c ' s ) ■ ( A - 26 ) 

Adding (A. 26) and (3.4), we obtain 

ie/\j' iei' ieiur 

- I (^V' Z (i\lf S^j + I (Zj\j] Zj\Z IC , S^j + I (Xp] Z r \Z r c, S) 
< I (Xiui> ] Z IL)I / 1 Z (/u/') c , S ) 

~I ^-H-%u/') c > ^) + 1 (^A-f' ^/l^ /c ' ^) ' (A.27) 

where the last step follows by comparing with (A. 22), and letting {I \ I, I') play the role of 
{1,1'). Further, expand 

^ Zi>\Z( Iu ry, S^j = H ^Zj^lZ^upy, — H ^Zj^Zpu^upy, S^j (A. 28) 

l{z ixi ]Zj\Zp,S) = h(z ixl \Zp,s)-h(z ixl \Z I[JIc ,s). (A.29) 
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Now, note V U (I U I') c = IUI C , and subtract (A.29) from (A.28) to obtain 




%i\f Zf\Zjc 7 Sj 
- H lZj\j\Z IC , S 



(A.30) 
(A.31) 



> 0. 



Here (A.30) follows by noting I c = (I U I'f U (/' \ I) with (I U I'f and V \ I disjoint. 
Further, (A.31) follows due to the fact that {Z ix j, {Z^iiy, ^r\i) d° no ^ f° rm a Markov 
chain. Using (A.31) in (A. 27), we again obtain (A. 23), which as earlier contradicts (3.2). 
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